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COMPUTER-INTENSIVE RATE ESTIMATION, DIVERGING 
STATISTICS AND SCANNING 

By Tucker McElroy and Dimitris N. Politis 
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A general rate estimation method is proposed that is based on 
studying the in-sample evolution of appropriately chosen diverging/con- 
verging statistics. The proposed rate estimators are based on simple 
least squares arguments, and are shown to be accurate in a very gen- 
eral setting without requiring the choice of a tuning parameter. The 
notion of scanning is introduced with the purpose of extracting useful 
subsamples of the data series; the proposed rate estimation method 
is applied to different scans, and the resulting estimators are then 
combined to improve accuracy. Applications to heavy tail index es- 
timation as well as to the problem of estimating the long memory 
parameter are discussed; a small simulation study complements our 
theoretical results. 

1. Introduction. Let X±, . . . ,X n be an observed stretch from a general 
time series {X{\ that is not necessarily linear, or stationary. A number of 
converging and/or diverging statistics can be computed from a dataset of 
this type. In many instances, however, the rate of convergence/divergence of 
some statistics of interest may be unknown, that is, it may depend on some 
unknown feature of the underlying probability law P. This rate is often a 
quantity of direct interest; for example, it may be connected to the heavy 
tail index, the long memory or self-similarity parameter, and so on. 

For each given context, that is, choice of statistic and assumptions on the 
time series {^}, a context-specific rate estimator may be devised and its 
properties analyzed. By contrast, a general approach for rate estimation has 
been given in the subsampling literature where knowledge/estimation of the 
rate is necessary for the construction of confidence intervals, hypothesis tests, 
and so on; see Bertail, Politis and Romano [3] or Politis, Romano and Wolf 
[19], Chapter 8. The subsampling rate estimator is based on evaluating the 
statistic of interest over subsamples of different size; subsequently, the rate of 
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convergence/divergence is gauged by the effect incurred on the distribution 
of the statistic when the subsample size varies. 

The subsampling rate estimator is consistent under very weak conditions. 
Nevertheless, a typical condition assumed in connection with subsampling 
is the strong mixing condition which may preclude its applicability in set- 
tings where the time series exhibits long-range dependence. In addition, the 
subsample size must be carefully chosen for optimal results; in general, this 
is a difficult problem analogous to the notorious bandwidth choice problem 
in nonparametric smoothing; see Politis, Romano and Wolf [19], Chapter 9. 

In this paper, a different noncontext-specific rate estimation method is in- 
troduced based on studying the in-sample evolution of appropriately chosen 
converging/diverging statistics. The proposed rate estimator is based on a 
simple least squares argument and is shown to be consistent in a very general 
setting that does not require the strong mixing assumption. Furthermore, 
no "bandwidth-type" selection is required for the new estimator. 

In order to improve the accuracy of this general estimation method, the 
notion of scanning a sequence is introduced. The proposed rate estima- 
tion method is implemented over different "scans" of the data sequence 
X\ , X n , and the resulting estimators are then combined to yield an im- 
proved estimator in the spirit of the "bagging" aggregation of Breiman [4] . 

In the next section a motivating example is given in the setup of estimation 
of the heavy tail index with data from a linear time series model. Section 
3 introduces the general rate estimation methodology based on statistics 
that converge/diverge without centering; the important notion of scanning 
a sequence is also introduced. In Section 4 the methodology is extended to 
cover the case of statistics that require centering in order to converge. An 
application to the problem of estimating the long memory parameter of a 
long-range dependent time series is given in Section 5, together with a novel 
application combining heavy tails and long-range dependence. The setup of 
Section 2 is revisited in Section 6 by means of a finite-sample simulation; all 
proofs are deferred to the Appendix. 

2. A motivating example: the heavy tail index. 

2.1. A heavy-tailed linear time series. Throughout this section (and this 
section only) we will assume that the data X\ , . . . , X n are an observed stretch 
of a linear time series satisfying Xt = J^j^z^j^t-j, for all t £ Z, where 
{Zt} is i.i.d. from some distribution F £ D(a). The filter coefficients {ipj} 
are assumed to be absolutely summable, and D{a) denotes the domain of 
attraction of an a-stable law with a S (0, 2] ; see, for example, Embrechts, 
Kliippelberg and Mikosch [8], Chapter 2. 

In this context, it is well known that there exist sequences a n and b n such 

that a~ l (^2 T l =l Z t — b n ) =^ S a , where S a denotes a generic a-stable law 
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with unspecified scale, location and skewness; recall that a n = n l l a L(n) for 
some slowly-varying function L(-). The centering sequence b n can be taken 
to be zero if either a < 1 or a > 1 and Z t has mean zero. When a = 1, we 
can only let b n = if Zt is symmetric about zero. 

Our goal is estimation of a, which is tantamount to estimation of the main 
part of the rate a n ; the shape of the unknown slowly-varying function £(■) 
is thus considered a nuisance parameter. Tail index estimators typically are 
based upon a number q of extreme order statistics, such as the well-known 
Hill estimator; see Csorgo, Deheuvels and Mason [5] and Csorgo and Viharos 
[6]. A practical problem for these estimators is choosing the number of order 
statistics, such q to be used; while it is known that we must have q — > oo 
and q/n — > as n ^ oo to ensure consistency, the optimal choice of q in any 
given finite sample situation is challenging; see, for example, Danielsson et 
al. [7] and the references therein. 

An alternative tail index estimator that is not based on order statistics 
has been recently proposed in the subsampling literature; see Bertail, Politis 
and Romano [3] and Politis, Romano and Wolf [19], Chapter 8. The sub- 
sampling tail index estimator is consistent under very general conditions; 
interestingly, it shares with Hill's estimator the difficulty of having to choose 
a "bandwidth" -type parameter, namely, the subsample size. It is of interest 
to construct a general rate estimator that is free from this difficulty of a 
"bandwidth" -type selection. 

2.2. A simple tail index estimator. Let S 2 = ^2~)™=i^t 2 > an d note that 
when a G (0, 2) it follows that 

n 

(1) n- 2 ^L(n)^Xf^J, 

t=i 

where L(-) is a slowly- varying function and J has a positively skewed S a /2 
distribution; see, for example, McElroy and Politis [14]. When a = 2, the 
expression (1) is valid if Zt has finite variance, and the law of large numbers 
kicks in. 

Let Yk = log<5|, and Uk = — 7 log A; + logL(fe) for k = 1, . . . ,n, where 
7 = — 1 + 2/ a. Then it is immediate that (1) implies that U n = Op(l). From 
the relation Yk = 7 log k + Uk — log L(k) , it is suggested that 7 could plausi- 
bly be estimated as the slope of a regression of Yk on log A;, with a resulting 
estimator for a. The reason that we treat logL(fc) as "approximately con- 
stant" in the regression of Yk on log A; is given in Proposition 2.1 below. 

Proposition 2.1. Any slowly-varying function L(-) satisfies 

(2) log L(k) = o(log k) as k -^00. 
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So, let 7 and 7 be the slope estimators in least squares (LS) regression 
of Y k on log k with and without an intercept term, respectively, and define 
a = 2/(7 + 1) and a = 2/(7 + 1). A rough estimate of slope in the regres- 
sion without an intercept is simply the ratio Y n /logn; see Meerschaert and 
Schefher [16]. 

Proposition 2.2. // (1) is true, then a — > a, as 00. 

Proposition 2.2 — whose proof follows from the more general Theorem 3.1 
in the next section — remains true even in the case a = 2 as long as (1) holds; 
see the discussion in Section 6. 

The rate of convergence of a can be quite slow. To get a more accurate 
estimator, a permutation/averaging technique was proposed in Politis [17]. 
However, permutations are only justified in the special case when the X% 
data are i.i.d.; to address the general scenario of dependent data, the notion 
of scanning is introduced in Section 3.2 and will be used in connection with 
an estimator of the type of a. Intuitively, including an intercept term in the 
regression offers an improvement, as it captures the nonzero large-sample 
expectation of U k , as well as the influence of the term log L(k). 

3. The general rate estimation methodology. 

3.1. Statistics that converge or diverge without centering. We outline be- 
low the basic rate estimation method and show its consistency under general 
conditions. 

(a) Let T n = T n (Xi, . . . ,X n ) be some positive statistic whose rate of 
convergence/divergence depends on some unknown real- valued parameter A. 

(b) Assume that for some slowly varying function L(n) and for some 
known invertible function g(-) that is continuous over an interval that con- 
tains A, we have U n = Op(l) as n — ► 00, where 

(3) U k = \og{k-^L(k)T k ) for k = l,...,n. 

(c) Estimate g(X) by g = g]=^ , and A by A = g-\g), 
where Y k = logT k for k = 1, . . . , n, ? = I £fc=i Y k and fog^ = ± £fc=i log k. 
Alternatively, estimate g(X) by g = J2k=i ^fcl°gfc/Sfe=i l°g 2 ^> an d A by A = 

To study A and A, the following additional assumptions will be useful: 

(4) U n =^ some r.v. U, with EUl -> EU 2 , 

(5) EU n -EU = 0(n- p ) for some p > 0, 
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and 

(6) Cav(U b , U n ) = 0{b rL n-^L(nj) for b < n and some < 71 < 72, 

where L is some slowly- varying function. Equations (4), (5) and/or (6) 
can be verified under some assumptions in the setting of Section 2; see 
www.math.ucsd.edu/~politis/PAPER/scansAppendix2.pdf for details. 

We are now able to state a general asymptotic result on A and A. Theorem 
3.1 below is a more general (and corrected) version of results in Politis [17] 
that were worked out under the assumption that the slowly- varying function 
is a constant. 

Theorem 3.1. Assume statements (a), (b), (c) are true. 

(i) Then A — > A as n — > 00. 

(ii) If assumption (4) holds, then Eg — > g(X) and Var(g) = O(l) as n — > 

00. 

(iii) If assumption (5) holds, then Eg = g(X) + A\ + A^, where 

A r\( -v\ \ a a Efc=i (log - log L) (log k - log n) 

Ai = 0(n p logn) and A2 = ; . 

1 J E^=i(logfc-logn) 2 

a p 

(iv) If assumptions (4) and (6) hold, then Var(^) = o(l) and A — ► A. 

Remark 3.1. Note that the estimators g and g correspond to L2 re- 
gression estimators of slope (with or without an intercept). However, an 
L\ regression estimator of slope would be a robust alternative which is ex- 
pected to also be consistent and perhaps even more reliable, especially if the 
large-sample distribution of the E/j. has heavier tails than the normal. 

Remark 3.2. The assumption U n = Op{\) in statement (b) would typ- 
ically be verified by proving a limit theorem of the type 

(7) n" s(A) L(n)T n J± J as n -> 00, 

where J is some well-defined probability distribution. Therefore, the impli- 
cation of the assumption U n = Op(l) is that if g(X) > 0, then T n diverges 
to 00, whereas if g{\) < 0, then T n converges to in probability; the case 
g{\) = roughly corresponds to the case where the uncentered distribution 
of T n converges in law to some nondegenerate distribution. Unless g(X) = 0, 
Yk = logTfc diverges to either +00 or — 00 as the block size k increases. In 
addition, note that centering can typically be omitted only when T n is a di- 
verging statistic, in particular, when the centering is constant or grows at a 
slower rate than the scale of T n . Thus, most applications of Theorem 3.1 are 
expected to be in cases where g(A) > 0. However, this rule is not adamant, 
as Remark 3.3 suggests. 
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Remark 3.3. In the setting of Section 2 the parameter A would be the 
heavy tail index a, and T n could well be the second sample moment S^', in 
that case, g(X) = — 1 + 2/A. Note, however, that the diverging statistic S% can 
be turned into a statistic that converges to zero by appropriate shrinking. 
For example, if a > 1, then the statistic T' n = S\jn converges weakly to zero, 
and logl^ to — oo; thus, the choice of U n based on the statistic T' n ensuring 
U n = Op(l) is identical to the U n corresponding to the diverging statistic 
S^, and a is the same in both cases, which is reassuring. In essence, these 
are not really separate cases; since log(n rf T n ) = dlogn + logT n , multiplying 
the statistic T n by n d leads to the same log-log regression. 

Remark 3.4. The validity of the regression of on log/c is based on 

asymptotic assumptions such as U n = Op(l) or U n ==x- U, line (2), and so on. 
Hence, the (Yfc,logfc) points may not be very informative if k is small, and it 
may be advisable in practice to drop some points from the regression, much 
in the same manner as some points are invariably dropped in the beginning 
of a Markov chain simulation. In other words, one would regress on log k 
for k = no, . . . , n, for some uq chosen either as constant or even as a function 
of n but such that n — uq — > oo without affecting the asymptotic consistency 
of A or A. Thus, choosing uq here is not a bandwidth-choice problem, and 
the choice uq = 1 is definitely a valid one; the reason is that the log-log 
scatterplot is very sparse for points with k small, and therefore, such points 
have little influence collectively. 

Theorem 3.1 shows that A is consistent under minimal assumptions, es- 
sentially the U n = Op{\) assumption of statement (b). Nevertheless, the rate 
of convergence of A may be very slow, essentially of logarithmic order. Intu- 
itively, as mentioned in Section 2, the estimator A should be more accurate 
than A; this is indeed true at the expense of the additional assumptions (4), 
(5) and (6). For example, it is immediate that the bias of A will tend to zero 
at a polynomial rate under some conditions on the slowly-varying function 
L, for example, when L is constant. However, no rate for the variance of 
A was given in Theorem 3.1. Furthermore, if assumption (6) fails and/or 
can not be verified, the rough bound Var(A) = 0(1) ensues by the delta 
method. Therefore, a technique to reduce the variance of A is desirable; this 
is accomplished in the next subsection via the notion of scanning a sequence. 

3.2. Scanning a sequence. The rate estimation method introduced in 
Section 3.1 is based on evaluating the statistic T/% on subsets (blocks) of 
growing size taken from the data set X\, . . . , X n . Subsequently, the in-sample 
evolution of the (logarithm of the) statistic Tf. is studied. This method is 
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closely related to subsampling since our statistic is evaluated on subsam- 
ples/subseries of the data. The only difference is that here we consider blocks 
of all sizes as opposed to one preferred block size; as a matter of fact, here we 
have one block for each block size k = 1, . . . , n. As in subsampling, the crux 
of the method outlined in Section 3.1 lies in the fact that and TV should 
behave similarly (when appropriately normalized); see Politis and Romano 
[18] or Politis, Romano and Wolf [19] for more details on the subsampling 
methodology, and Barbe and Bertail [1] in connection with the study of 
subsamples of increasing size. 

To fix ideas, assume that the time series {X{\ is strictly stationary. In that 
case, it is apparent that the statistic T& should behave in the same fashion 
when applied to any stretch of size k of consecutive data points extracted 
from the data series X\ , . . . , X n ; this observation motivates the notion of 
"scanning." On top of the particular application that will become obvious 
immediately, scanning may also provide an alternative way to think about 
the usual expanding sample asymptotics for stationary time series. 

Definition 3.1. A scan is a collection of n block-subsamples of the 
sequence X\, . . . ,X n with the following two properties: (a) within each scan 
there is a single block of each size k = 1, . . . ,n; and (b) those n blocks are 
nested, that is, the block of size k\ can be found as a stretch within the 
block of size k 2 when k\ <k 2 - 

As usual, a block-subsample of the sequence X\ , . . . , X n is a block of 
consecutive observations, that is, a set of the type Xj, Xj+i, . . . ,Xj +m . 

We will say that the sequence X±, . . . ,X n has been scanned if a block 
corresponding to each block size k = 1, . . . , n has been extracted, and if those 
blocks are nested, that is, the block of size k\ can be found as a stretch within 
the block of size k2 when k\ < k 2 - For example, in Section 3.1 the following 
"direct" scan was employed: 

(Xi), (Xi, A 2 ), (Xi, X 2 , A 3 ), . . . , (X±,. . . (Xi, . . . , A„), 

over which the in-sample "evolution" of T n was investigated. Nevertheless, 
there are many possible scans; for example, consider the "reverse" scan 

(X n ), (A n _i, A n ), (A n _2, A n _i, A n ), . . . , (X 2 , ■ ■ ■ ,X n ), (Xi, . . . ,X n ). 

In general, a scan will start at time-point j (say) and then the blocks will 
proceed growing/expanding to the left and/or to the right — thus, the differ- 
ent perspective on asymptotics; for example, a valid scan is 

(X$), (X4, X 5 ), (A 3 , X4, A 5 ), (A 3 , A 4 , A 5 , X 6 ), . . . , (Ai, . . . , A n ); 

note how within each block the natural time order is preserved, and how all 
scans end with the block containing the full data set. The number of possible 
scans is large as the following proposition shows. 
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Proposition 3.1. There are 2 n different scans of the sequence X\, . . . ,X, 
when no ties are present. 

Let B\ = (X{, . . . ,X i+ k-i), that is, B\ for i = 1, . . . , n — k + 1 are all 
the possible blocks of size k. Pascal's triangle and a backward induction 
argument suggest the following useful corollary. 

Corollary 3.1. Among the 2 n_1 different scans of the sequence X\ , . . . , X n 
there are exactly scans that contain block B\ as their block of 

size k for 1 < i < n — fe + 1 . 

A collection of algorithms to generate randomly selected scans can be 
found at www.math.ucsd.edu/~politis/PAPER/scansAlgorithms.pdf, where 
some properties of those algorithms are also discussed. 

3.3. Improving upon the basic estimator. As mentioned before, the usage 
of the particular "direct" scan 

(-Xi)j (Ai, A 2 ), (X\, X 2 , X 3 ), (Xx,. . . , A n _i), [Xi, . . . ,X n ) 

in Section 3.1 was quite arbitrary; any scan could have been used with similar 
results. To elaborate, consider all the 2 n_1 different scans of the sequence 
X\, . . . , X n ; order the scans in some arbitrary fashion, focus on the ith such 
scan, and consider the following analogs of our previous statements (a)-(c). 

(a[i]) Let T n = T n {X\, . . . , X n ) be some positive statistic whose rate of 
convergence/divergence depends on some unknown real- valued parameter A. 

(b[I]) For k = l,...,n, let denote the value of the statistic as 
computed from the block of size k of the Ith scan of the sequence X\ X n . 

(c[I]) Estimate A by =g~ 1 (g), or by =g~ 1 (g), where 

. = SLi(n-y)(iogfc-tojH) u _ ELimogfc 

9 Efc=i(logfe-logn)2 ' 9 Epilog 2 *;' 

and Y k = loglf > for k = 1, . . . , n, Y = I ELi >fe and lo^ = I Efc=i log k. 

Theorem 3.2. Assume that the time series {X t } is strictly stationary. 
Under the assumptions of Theorem 3.1, the conclusions of Theorem 3.1 
remain true with X^ and A^ in place of X and X, respectively, for any I. 

Theorem 3.2 — whose proof is identical to the proof of Theorem 3.1 — 
suggests an approach on potentially improving the estimators A and A by 
combining/averaging the estimators based on scans. Consider the estima- 
tors AW,...,.^ and for some integer N, and define A* = 
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N' 1 Eili and A* = iV" 1 J^iLi A (i) . A different way of combining estima- 
tors is given by the median; so, we also define A** = median^ 1 ), . . . , A^} 
and A** = median{A^\ . . . , A^- 1 }. The median estimators A** and A** will 
exhibit similar variance reduction behavior as the mean estimators A* and 
A*. However, the median may be preferable in practice because of its robust- 
ness. The following corollary shows that averaging does not hurt asymptot- 
ically. 

Corollary 3.2. Assume that the time series {Xt} is strictly station- 
ary. 

(i) Assume N is fixed. Under the assumptions of Theorem 3.1, the con- 
clusions of Theorem 3.1 remain true with A* or A** in place of A, and A* 
or A** in place of A. 

(ii) Assume N is a general positive function of n (possibly diverging to 
infinity as n^> oo). Under the assumptions of Theorem 3.1, the conclusions 
of Theorem 3.1 remain true with X* in place of A and A* in place of A. 

It is generally difficult to quantify the variance reduction effect of scanning 
estimators; nevertheless, the simulations in Section 6 show a very spectac- 
ular effect even with a small value of N. Note that N is really tied to the 
practitioner's computational facilities, and not so much to the sample size 
n or the number of scans 2 n_1 . The recommendation is to take N as big 
as computationally feasible; in practice, however, even taking N as small as 
100 gives a significant benefit especially if the N scans under consideration 
are very different from one another. A way to ensure this is to use N ran- 
domly selected scans from an algorithm that gives (close to) equal weight 
to each scan. A practical option is given by Algorithm A(/) or Algorithm 
B' — the latter being valid only for weakly dependent, stationary sequences; 
see www.math.ucsd.edu/~politis/PAPER/scansAlgorithms.pdf for details. 

4. Extensions of the basic methodology. 

4.1. Limit theorems with centering. As mentioned in Remark 3.2, center- 
ing can typically be omitted in the case of diverging statistics. By contrast, 
in most cases of converging statistics a centering will be necessary in order 
to transform T n into a bounded random variable (in probability). Therefore, 
the following extension of the rate estimation methodology of Section 3 is 
proposed. 

(a') Let T n = T n (X\, . . . ,X n ) be some (not necessarily positive) statistic 
whose rate of convergence depends on some unknown real-valued parameter 
A. Also assume that P(Tf. = T n ) = for k = 1, . . . , n — 1. 



10 



T. MCELROY AND D. N. POLITIS 



(t/) Assume that for some slowly varying function L(n) > and for some 
known invertible function g(-) that is continuous over an interval that con- 
tains A, and such that g(\) < 0, we have 

(8) n~ 9W L(n)\T n - n\ J as n -> oo, 

where // is a real- valued parameter and J some well-defined probability dis- 
tribution; both [i and the shape of the limit distribution J can be unknown. 

(c') Let m, b be positive integers with m < n — b and b < n; as before, 
we estimate g(X) by g m , b = Y^SXk ~ Y) (log k - tog)/ Efct^(log fc-Iog) 2 , 
and A by \ m . b = g~ l {g m ,b), where Y k = log \T k -T n \,Y = ^ l Efctm Y k and 

Note that g in the above is an L2 regression estimator of slope. As in 
Remark 3.1, here too it should be stressed that an L\ estimator of slope in 
the regression of Y k on log k for k = m, . . . ,b + m (with an intercept term 
included) might well give an attractive alternative that would be robust to 
the possibility that one of the TVs happens to be very close to T n . 

Theorem 4.1. If statements (a'), (b') and (c') are true, and assump- 

Hons (4) and (6) hold, then \ m ,b — ► A, provided 1 < m < n — b and b — > 00 
but b + m = o(n) as n — > 00 . 

The assumption P{T k = T n ) = is imposed to ensure that Yf. is well 
defined; it follows easily if the distribution of the statistic T n is absolutely 
continuous, in which case the probability of exact ties is zero. The condition 
P(Tfc = T n ) = could actually be relaxed to P{T k = T n ) — > when k = 
k{n) — > 00 as n — ► 00 to accommodate the handling of statistics with discrete 
distributions; the details are straightforward and are omitted. 

Remark 4.1. Note that choosing m is not a "bandwidth" selection 
problem; the choice m = 1 is fine for Theorem 4.1, although, in practice, 
one may prefer to take m to be a small positive integer. Nevertheless, the 
trade-off requirements b — > 00 but b + m = o{n) imply that choosing b is 
unfortunately a "bandwidth" -type problem. In this sense, rate estimation 
for uncentered diverging statistics seems to be easier to deal with; see, for 
example, Remark 3.2. To sidestep this difficulty, one may try to recast the 
problem into a diverging setup. So if T n is nonnegative, and if a lower bound 
for g(X) is known to exist [say G < g(X) < 0], then line (8) implies that the 
uncentered quantity n G ~ 9 ^ L(n)T n should be diverging to 00, and thus, the 
methods of Section 3 may be applicable; see Section 5.1 for an example of 
such a transformation. 
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4.2. Improving upon the basic estimator. As before, the notion of scan- 
ning may lead to improved estimation. Focus on the Jth scan, and let Tjp 
denote the value of T\. computed from the block of size k of the ith scan of 
the sequence X±,... ,X n . Estimate g(X) by 

S^(n-r)(iogfc-toj) 

Efct™(l0gfc-l0g)2 ' 

and A by SQ b = g~ l {g m>b ), where Y k = log |T fc (/) -T n \,Y = ^ YitZ Y k and 

log = Sfcim^°S^- ^he following theorem and corollary ensue with proof 
identical to the proof of Theorem 4.1 combined with Corollary 3.2. 

Theorem 4.2. Assume the time series {X t } is strictly stationary. If 

statements (a'), (b') and (c') are true, and assumptions (4) and (6) hold, 
~ (I) P 

then X m b — ► A, provided 1 < m < n — b and b — > oo but b + m = o(n) as 
n —> oo. 

To produce an improved estimator, we may again define 

N 

Kb = N ^Il^, b and A^ = median{A«,...,0, 

i=l 

where N is some fixed positive integer. 

Corollary 4.1. Assume the time series {Xt} is strictly stationary. If 
statements (a'), (b') and (c') are irae, and assumptions (4) and (6) /ioW, 

i/ien A^^ — > A and A,^ b — > A, provided 1 < m < n — b and b — > cxd 6ui 
b + m = o(n) as oo. 

5. Two examples with long memory. The study of long memory time 
series appears to have been initiated by the hydrologist H. E. Hurst [12], who 
investigated the flow of the river Nile. Notably, Hurst's original R/S statistic 
was driven by a log-log regression as is our rate estimator A; see Beran [2] or 
Giraitis, Robinson and Surgailis [10] and the references therein. Interestingly, 
the well-known Geweke and Porter-Hudak [9] estimator of the long memory 
parameter also entails a log-regression based on some particular diverging 
statistics, namely, the periodogram ordinates at frequencies near zero. 

5.1. A second example: long memory time series. Long memory time se- 
ries are typically defined via an underlying stationary, mean zero, purely non- 
deterministic Gaussian time series {Gt,t £ Z} with autocovariance R(k) = 
Cov(G7o,Gfc) that is not absolutely summable. So assume that X t = h(Gt), 
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where h is some measurable function satisfying Eh 2 (Gt) < oo. Also assume 
that R(k) = k~@L(k) as k — > oo, where L is some slowly varying function, 
and (5 > some unknown constant termed the long memory parameter. If 
(3 > 1, then the series {Ai} and {G^} are weakly dependent, and the follow- 
ing central limit theorem typically holds: 

(9) y/E(X n -ii)=^N(o, jr, Rx(k)\ 

\ k=— oo / 

where X n = n~ l Y^=iX u R X (k) = Cov(A ,A fc ) and fi = EX t . If p < 1, 
then the sequences {A t } and {Gt} are said to be long-range dependent and 
neither of them is strong mixing; see Ibragimov and Rozanov [13]. Hence, the 
subsampling methodology of Politis and Romano [18] may not be applicable, 
and the same is true for the subsampling rate estimator of Bertail, Politis 
and Romano [3] and Politis, Romano and Wolf [19], Chapter 8. In the long- 
range dependence case of j3 < 1, the following is true: 

(10) n(X n - fi)/d n =4 W q , 

where d n = n l ~ q PI 2 L q l 2 (n), and q is the Hermite rank of h; see Taqqu [20, 
21]. It is often the case that q = l, in which case the limit distribution W± is 
a mean-zero Gaussian; for q>2, W q is not Gaussian. Nevertheless, just the 
existence of the limit distributions in lines (9) and (10) is enough to imply 
that the techniques of Section 4 are applicable. In particular, a consistent 
estimator of the product q(3 can be constructed using the sample mean as 
the converging statistic in Theorem 4.1; if q is known, then this immediately 
yields an estimator of the long memory parameter (3. 

Different statistics could also be used; one example is the familiar second 
sample moment S 2 = re" 1 X)"=i that was the focus of Section 2. As analo- 
gous limit theorems as (9) and (10) hold for the second sample moment, our 
rate estimation method of Theorem 4.1 could be based on the converging 
statistic S 2 . The second sample moment S 2 ,, however, is especially useful 
as it can be transformed to a diverging statistic as suggested in Remark 
4.1. To do this, we simply let T n = YS=i x t = nS l- lt 

is easy to see that 

the requirements of Theorem 3.1 are satisfied for the diverging statistic T n , 
and thus, a "bandwidth-free," consistent estimator of the product q' (3 can 
be built based on T n ; here, of course, q' denotes the Hermite rank of the 
function h 2 . 

5.2. A third example: heavy tails with long memory. Consider a time 
series defined as Xt = ^fzfit for t € Z, where the series {et} and {Gt} are 
independent, and the e^'s are positive and i.i.d. with distribution in D(a/2) 
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for some a G (0,2), and {G{\ is stationary Gaussian with mean zero and 
autocovariance R(k). For some £ E [0, 1), define the condition 

LM(C): | J! i?W ~ Cn c and ^ \R(h)\ = 0(n c ) as n -> oo 

l|/i|<n |ft|<n 

where C > is a constant. As before, the series {A^} and {^} are said 
to have long memory if LM(() holds with £ 6 (0, 1), in which case the long 
memory parameter (3 equals 1 — £; the case LM(0) denotes weak dependence. 

Interestingly, when appropriately normalized, the sample second moment 
converges in distribution in this general setting as the following proposition 
demonstrates; see Gomes, de Haan and Pestana [11] and McElroy and Politis 
[15] for related results. 

Proposition 5.1. In the setting described above [including condition 
LM(Q], suppose that e t is absolutely continuous with a probability density 
f e that is bounded and ultimately monotone, that is, f e is monotone on 
{z, oo) for some z > 0, and is monotone on (— oo,u) for some u < 0. Then 
we have 

n 

a~ 2 ^ W asn— > oo, 

t=i 

where a n = n l l a K(n) for some slowly varying function K{n). In the above, 
W is a/2-stable with scale C^j!^ (E\Gt\ a ) 2 ^ a , skewness 1 and location zero, 
and the constants C~ are defined by C" 1 = T(2 - p) cos(ttp/2)/(1 -p). 

The limit theorem of Proposition 5.1 is interesting, because the conver- 
gence of the sample second moment does not depend on the long memory 
parameter, and hence, our methods from Sections 3 and 4 can be unambigu- 
ously applied to estimate a. Other methods in the tail index estimation lit- 
erature may well encounter serious difficulties in this context, being sensitive 
to long-range dependence; this seems to be true for the Hill estimator — see 
Embrechts, Kliippelberg and Mikosch [8]. It is also true for the subsam- 
pling rate estimator of Bertail, Politis and Romano [3] ; see the discussion in 
Section 5.1. 

6. A small simulation experiment. We now revisit the setup of Section 
2, that is, data X\, . . . ,X n from a linear time series; see Remark 3.3. To 
perform the simulation, the AR(1) model 

(11) X t = pX t _ 1 + Z t 

was employed with p = —0.5, 0.1 or 0.7 and {Zt} i.i.d. from a distribution 
F £ D(a). The distributions used were (i) {Z t } ~ i.i.d. Cauchy, (ii) {Z t } ~ 
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i.i.d. 1.5-Stable (symmetric), (iii) {Z t } ~ i.i.d. 1.9-Stable (symmetric), (iv) 
{Z t } ~ i.i.d. N(0, 1), (v) {Z t } ~ i.i.d. Pareto(2, 1), (vi) {Z t } ~ i.i.d. Burr (2, 
1, 0.5) and (vii) Z t = Z t • max(l,log 10 \Z t \), where {Z t } ~ i.i.d. Burr (2, 1, 
0.5). The variation (vii) has as its purpose the construction of a nonnormal 
domain of attraction, that is, the case where the slowly-varying function L 
is not constant; see Embrechts, Kliippelberg and Mikosch [8]. 

For each combination of the value of p and the distribution F, 100 time 
series stretches were generated, each of length n = 1,000. From each series, 
the estimator a was computed, where a was defined in Section 2; also com- 
puted were the improved versions a* and a** , that is, the mean and median 
of the values of a based on scans as in Corollary 3.2. Note that the infor- 
mation that < a < 2 was explicitly used in that values of a bigger than 
2 were truncated to the value 2; interestingly, no occurrences of a negative 
a were observed. This truncation is necessary for good performance of a*, 
but is superfluous for a** since the latter is based on a median that "clips" 
outliers. 

A number of scanning algorithms can be devised; the website 
www.math.ucsd.edu/~politis/PAPER/scansAlgorithms.pdf presents Algo- 
rithms A, B and A(/), making the claim that Algorithm A(/) — with a 
carefully chosen / — may be preferable. However, Algorithm A(/) is very 
computer-intensive. Although this is not a problem for the practitioner with 
a single dataset at hand, it is prohibitive in terms of conducting a simula- 
tion with thousands of datasets. A computational shortcut is presented by 
Algorithm B' that is valid for weakly dependent stationary sequences only. 
In particular, it is not suitable for the "long-memory" series of Section 5; 
see the aforementioned website for more details. 

The results of our simulation, where N random scans were generated using 
Algorithm B', are summarized in Table 1 where the empirical mean squared 
error (MSE) of each estimator is given. In this setup, the benchmark for 
comparison among estimators of a is given by the Hill estimator H q based 
on q extreme order statistics. Empirical MSEs of Hill estimators are given in 
Table 2 for different values of q. Also included in Table 2 are the (empirically 
found) true optimal values of q, denoted by q op t\ in other words, H qopt was 
the smallest MSE empirically computed from the model in question over a 
wide range of q values. Things to note are the following: 

• Averaging over scans does indeed succeed in dramatically reducing the 
MSE of estimation. As a matter of fact, even with N as low as 100, signif- 
icant benefits ensue, typically halving the MSE of the original estimator; 
this is of course contingent on having those N scans generated in a very 
"random" fashion as Algorithm B' ensures. 

• The comparison between a* and a** is unclear. The former seems to 
lead to somewhat smaller MSEs, but it should be borne in mind that its 
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Table 1 

Empirical MSEs of estimators of the heavy tail index a; data from model (11) with 
n= 1,000 and (&) p = 0.1, (b) p = 0.7, (c) p = -0.5 



(JV=20) "(iV = 100) "(JV = 200) "(iV=20) "(^ = 100) "(JV=200) 



(a) 



(i) 


0.315 


0.223 


0.102 


0.096 


0.329 


0.098 


0.085 




0.171 


0.109 


0.064 


0.064 


0.152 


0.107 


0.109 


fiiil 


0.051 


0.036 


0.025 


0.024 


0.044 


0.041 


0.037 


(iv) 


0.006 


0.004 


0.002 


0.002 


0.004 


(<0.0005) 


(<0.0005) 


(v) 


0.222 


0.190 


0.142 


0.140 


0.220 


0.167 


0.166 


(vi) 


0.294 


0.159 


0.079 


0.079 


0.228 


0.106 


0.101 


(vii) 


0.319 


0.156 


0.074 


0.068 


0.260 


0.106 


0.096 


(b) 
















(i) 


0.328 


0.193 


0.108 


0.106 


0.265 


0.127 


0.109 


(ii) 


0.161 


0.097 


0.057 


0.055 


0.147 


0.101 


0.093 


(iii) 


0.078 


0.046 


0.034 


0.033 


0.059 


0.058 


0.052 


(iv) 


0.011 


0.009 


0.006 


0.005 


0.010 


0.002 


0.001 


(v) 


0.120 


0.079 


0.079 


0.077 


0.091 


0.088 


0.084 


(vi) 


0.343 


0.205 


0.105 


0.103 


0.312 


0.112 


0.107 


(vii) 


0.314 


0.189 


0.062 


0.060 


0.295 


0.102 


0.097 


(c) 
(i) 


0.322 


0.234 


0.145 


0.138 


0.322 


0.156 


0.145 


(ii) 


0.139 


0.097 


0.055 


0.052 


0.151 


0.091 


0.086 


(iii) 


0.054 


0.046 


0.026 


0.028 


0.056 


0.040 


0.044 


(iv) 


0.008 


0.005 


0.003 


0.003 


0.007 


0.001 


(<0.0005) 


(v) 


0.254 


0.193 


0.164 


0.169 


0.218 


0.202 


0.210 


(vi) 


0.295 


0.204 


0.089 


0.079 


0.283 


0.123 


0.109 


(vii) 


0.321 


0.151 


0.064 


0.056 


0.237 


0.105 


0.097 



performance is aided by the truncation of the original estimator to the 
value 2. On the other hand, a** is more robust, and thus recommendable 
in a general setup when outside information — such as the restriction a £ 
(0, 2] — may not be available. 

Comparing Table 1 to Table 2, it is apparent that both a* and a** under- 
perform as compared to the optimized Hill estimator H qopt in cases (i), (ii), 
(v) and (vi), whereas a* and a** perform comparably to H qopt in cases (iii) 
and (vii). Both a* and a** perform excellently in the Gaussian case (iv); 
however, as kindly pointed out by one of the referees, the Hill estimator is 
inapplicable/inconsistent in this it diverges to infinity — therefore, 

the n/a's in Table 2. 

Perhaps it should be stressed that q op t is not known by the practitioner. 
As mentioned earlier, estimation of q op t is not a trivial matter and is further 
complicated when the data are dependent; see Embrechts, Kliippelberg and 
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Table 2 

Empirical MSEs of Hill estimator H q based on q order statistics; data from model (11) 
with n = l,000 and (a) p = 0.1, (b) p = 0.7, (c) p = -0.5 





-ffioo 


H-200 




£^400 




q op t 


(a) 

(i) 

(") 

(iii) 

(iv) 

(v) 
(vi) 
(vii) 


0.011 
0.121 
1.469 
n/a 
0.149 
0.032 
0.094 


0.011 
0.013 
0.043 
n/a 
0.291 
0.065 
0.106 


0.048 
0.130 
0.290 
n/a 
0.450 
0.099 
0.134 


0.170 
0.546 
1.147 
n/a 
0.629 
0.138 
0.167 


0.007 
0.013 
0.017 
n/a 
0.086 
0.027 
0.059 


140 
200 
220 
n/a 
40 
60 
20 


(b) 

(i) 

(") 

(iii) 

(iv) 

(v) 
(vi) 
(vii) 


0.045 
0.253 
1.262 
n/a 
0.373 
0.057 
0.048 


0.019 
0.039 
0.050 
n/a 
0.147 
0.023 
0.064 


0.051 
0.135 
0.315 
n/a 
0.059 
0.017 
0.078 


0.136 
0.562 
1.198 
n/a 
0.026 
0.022 
0.087 


0.019 
0.031 
0.035 
n/a 
0.026 
0.017 
0.048 


200 
220 
220 
n/a 
400 
300 
100 


(c) 

(i) 

(ii) 

(iii) 

(iv) 

(v) 
(vi) 
(vii) 


0.017 
0.135 
1.297 
n/a 
0.184 
0.038 
0.104 


0.012 
0.015 
0.042 
n/a 
0.421 
0.084 
0.138 


0.042 
0.118 
0.286 
n/a 
0.727 
0.141 
0.183 


0.155 
0.532 
1.111 
n/a 
1.072 
0.219 
0.252 


0.011 
0.013 
0.015 
n/a 
0.118 
0.034 
0.052 


180 
220 
220 
n/a 
40 
60 
20 



Mikosch [8] or Danielsson et al. [7] and the references therein. This phe- 
nomenon is manifested in our simulations, especially in cases (v)-(vii), that 
is, the Pareto and Burr distributions, for which the value of the empirically 
found g op t seems to be quite unstable as a function of the dependence factor 
p, to the extent exemplified in our small simulation. 

The simulation confirms that our proposed methodology leads to reason- 
able estimates of the index of domain of attraction under (linear) dependence 
and possibly nonnormal domains of attraction, that is, nonconstant slowly- 
varying function L. Nevertheless, it should be stressed that our methodology 
has general applicability, and it is not specific to the particular context as 
Hill's estimator is. Of course, it is expected that context-specific, carefully 
optimized estimators may give improved performance relative to this general 
"off-the-shelf" tool. The fact that in some of the cases considered, for exam- 
ple, (iii) and (vii), our general methodology performs comparably with the 
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optimally fine-tuned Hill estimator (using the true q Q pt) can be considered 
remarkable. 

An added bonus of our methodology is that it is totally automatic: no 
fine-tuning is required in terms of a tricky "bandwidth" -type choice, such 
as estimating q opt for the Hill estimator. In addition, note that — even in the 
specific tail estimation context of this section — our methodology is applica- 
ble in connection with different diverging statistics other than the second 
moment. There is a plethora of such diverging estimators that can be used; 
for example, T n could be taken as the 2rth sample moment for some integer 
r > 1, the rth sample moment of the absolute values of the Xt's for some inte- 
ger r > 2, the maximum M n = max{Xi, . . . , X n } or the range K n = M n — L n , 
where L n = min{Xi, . . . , X n }. 

The performance of those different candidate statistics is context-specific, 
and will generally depend on many factors, including the underlying value 
of a as well. Furthermore, since all these different statistics yield useful 
information for a, it is conceivable that they can all be combined to con- 
struct an improved estimator. To give a concrete example, let d**( r ) de- 
note our median-averaged estimator of a based on the rth sample moment 
of the absolute values as the diverging statistic. The estimators d<**( r ) for 
r = 2, 3, . . . , R can be constructed for some fixed integer R whose magnitude 
will depend on the practitioner's computational facilities. Those R estima- 
tors can then be combined to yield the yet improved estimator 

(12) a**' R = median(a**( 2 \ . . . ,a** {R) ). 



APPENDIX: TECHNICAL PROOFS 



Proof of Proposition 2.1. By a corollary of Karamata's representa- 
tion theorem, see, for example, Theorem A. 3. 3 in Embrechts, Kliippelberg 
and Mikosch [8], it follows that log L(n)/ log n asymptotically behaves as 
fz(5(u)/u) du/logn for some number z > and a measurable function 5(u) 
that tends to zero as u — > oo. If the integral converges, then the assertion is 
proved; otherwise use l'Hopital's rule to obtain an asymptotic rate of S(n), 
which tends to zero. Thus, log L(n)/ log n = o(l). □ 

PROOF of Theorem 3.1. Consider the identity 
(13) Y k = g(X) log k + U k - log L(k) 

for k = 1, . . . ,n. After some straightforward calculations, we have that g = 
g(X) + ai + a2, where 

a i = 9 — an< l °2 = o • 

ELiiog 2 fe ELiiog 2 ^ 
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Note that 

n 

c\ logn < logn < C2 logn and c^(logn) 2 < n^ 1 y^(logfc) 2 < c^(logn) 2 , 

k=l 

for some constants q > 0. Since it is assumed that U n = Op(l), it follows 
that tti = Op(\/\ogn) = op(l). Using line (2), it follows that = o(l) as 
well. Hence, g = g(X) + op(l). Finally, part (i) is proven by an application 
of the continuous mapping theorem. 

To analyze A, note that similarly we have g = g(X) + Aq + A2, where 



= E n k=i(Uk-U)(logk-logn) 
° ELi(log^-logn) 2 



J2t =1 (logL(k)-logL)(logk-logn) 
2 £5UGogfc-logn)2 

here E7 = ±E£=l^ and I5gT = I ELi logL(fc). Let ^ = £L4 ]. By a 
Riemann-sum approximation argument, it follows that 



n 

1 1 (log — logn) 2 = n" 1 ^(log£;/n — log k/n) 2 



k=l k=l 

(14) 



/ (logx+ l) 2 dx = 1, 



where logfc/n= -Efc=il°g^/ n - Focus on the numerator of A2: defining 
L n (x) = L(\nx~\) such that L n (k/n) =L(k), we obtain 



1 n 



-^{logL{k) - logL)(logk - logn) 



n k=i 



I n 

= 7 logL n (fc/ra)(log/c/ra — log k/n) 

= - [ L n {x){logx + l)dx + o{l) 
Jo 

by a straightforward application of the definition of the Riemann integral. 
Note that the error in this approximation is just o(l) instead of 0(l/n), 
since log 2; does not have a bounded derivative on [0,1]. From Theorem 
A3. 3 of Embrechts, Kliippelberg and Mikosh [8], we have the representation 
L(y) = c(y) exp{/J (n(u) /u) du} for some z > 0, c(x) — > c > and n(x) — ► 
as x — > 00 . It follows that 



logL n (x) -logL n (l) = log(c^ nx \/c n ) - I (n(u)/u)du. 

J \nx\ 
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So for each fixed x £ (0, 1], c^ nx -\jc n — ► 1 and 
(rj(u)/u) du 



\nx~\ 



nil — x) 

< sup |r/(u) 



uS|na;,riJ 



which tends to zero as n — > oo. This shows that logL n (x) — logL n (l) = o(l). 
But since logL n (l) does not depend on x and J (logx + l)dx = 0, 

-/ L n (x){\ogx + l)dx = - ( (L n (x)-A,(l))(logx + l)dx->0 

JO JO 

by the dominated convergence theorem, since the integrand converges uni- 
formly to zero. This shows that A2 = o(l). 

Part (ii). Now assume (4). From line (14), we also have that 

n 

A = n" 1 ^ U k (log k/n- log k/n) + o(l), 

k=l 

which we will denote by I\. Now since EU 2 — ► EU 2 , we find that sup n EU 2 < 

00 so that {U n } is a uniformly integrable sequence. Together with U n ==>■ U, 
this implies that EU n — > i££7 as n — > 00, and also that for each x, EU n (x) — > 
^[7 with U n (x) = Ur nx -\ . Hence we calculate 

1 n 



£ii = ~y EUJ k /n)(log k/n -log k/n) 

= o(l) + / ££/ n (x) (log x + 1) dx 
Jo 

-> EU(logx + l)dx = 
Jo 

using the dominated convergence theorem. Hence, Eg = g(X) + A\ + ^2 = 
5(A) +o(l). Now observe that 

n n 

Var(A)) ~ n" 2 ^^]Cov(C/ b ,[/fe)(log6- logn)(log/c -logn). 

fe=lb=l 

So from (4) it follows that Cov(?7b, £/&) = 0(1); thus, line (14) implies that 
Var(^4o) = 0(1), completing the proof of part (ii). 
Part (hi). Now assume (5) as well; 

1 n 

E h = ~ E ( EU k ~ EU ) (log k ~ l°g n) , 

since EU does not depend on k. Taking absolute values produces a crude 
bound of ^ J2k=i Ck~ p \og k for some constant C > 0; thus, is clearly 
0(n~ p logn), and ^2 has already been analyzed. 
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Part (iv). Finally, assume (4) and (6). Then we have 

1 2 n n 

Var(Ao) < ]T £ | Cov(U b , U k )\ = 0(n^ (log 2 n)L(n)) = o(l), 

71 fc=16=l 

P P 

which shows that g — ► g{\) and hence A — ► A as well. □ 

Proof of Proposition 3.1. To see this, one has to look at the last 
block of a scan and deconstruct it, that is, go backward. Since the last 
block is always [X\, . . . ,X n ), the next-to-last is either (X±, . . . ,X n -i) or 
(X2, ■ ■ ■ ,X n ), that is, one of two choices. Similarly, from any step of the 
process, for example, from the block of size k + 1, there are two choices 
for the preceding block corresponding to shrinking from the left or from 
the right. Thus, there are two choices for each of the n — 1 steps of the 
deconstruction of the last blocks; these choices multiply to give the number 
2 n ~ l . □ 

Proof of Corollary 3.2. Regarding A* and A*, the proof follows by 
a simple application of the Cauchy-Schwarz inequality. Regarding A** and 
A**, just note that they represent medians of N i.i.d. random variables where 
N is finite. □ 

p 

Proof of Theorem 4.1. We first show that \ mib — ► A under the 
assumptions of Theorem 4.1 together with the additional assumption that 
m — ► 00. First note that if we define 

U k = \og{k~ 9 ^L{k)\T k -T n \) for k = m,...,b + m, 

then the identity (13) still holds true but now for k = m, . . . ,b + m only. In 
addition, we also have U n = Op(l) as n — > 00 as before. To see this, note 

that k~ 9 ^ L(k)\T k — => J as k — > 00 by assumption (8). Also note 
k- 9 ^L(k) (T k - fj.) = k-3^L(k) (T k - T n +T n -p) = k-^L(k) (T k -T n ) + A , 
where A = fc-^W L(k)(T n - p). But 

n-9WL(n)' 

again by assumption (8). Since k/n< {b + m)/n = o(l) and g(X) < 0, it fol- 
lows that Aq = op(l). Finally, Slutsky's theorem and the continuous map- 
ping theorem ensure that k~ 9 ^ L(k)\T k — T n \ J as k — ► 00 and hence, 
U k = Op(l) as k— > 00. By a calculation similar to that in the proof of The- 
orem 3.1, we have that g m ^ = g{X) + A\ + A2 , where now 

E b k tZ(u k -u)(iogk-hi) 



\A \ = k-^L(k)\T n -,\ = ^Mn- 9{X) m\Tn-^\ = Op((^)" (A) ). 



A, 



E^(l0gfc-l0g)2 



RATE ESTIMATION AND SCANNING 



21 



and 

A E^(log^(fc)-bjL)(logA:-toi) 

Eti™(iogfe-iog) 2 

here U = ^E&^ and I5gT = ^ EK>gL(A:). As in the proof of 
Theorem 3.1, it follows that, as b — > oo, .A2 = o(l) and £L4i = o(l) by 
equation (4). Moreover, Var^4i = o(l) by equation (6), and hence, g m ,b = 
g(X) + op(l). An application of the continuous mapping theorem shows that 

^m,b — ► A. 

We now wish to relax the extra assumption m — > 00. To do this, we will 

p 

show that m = 1 is a good enough choice, that is, that Ai^ — > A when b — > 00 
but b = o(n); the proof for other nondiverging choices for m is similar. Note 
that by the above arguments we can write Ai^ = 5 ,_1 (5i,b), where 

9i,b = g(X)+A* 1 +A* 2 ; 

in the above, A\, A 2 are similar to the terms Ai,A 2 but with summations of 
the type Efc=i instead of Efeim ™ both numerator and denominator. Now 
consider a choice of m satisfying m — > 00 but also m = 0(6). By the above 

p 

discussion, we have shown that X m ^- m+ i — ► A. In particular, we can write 

g m ,b-m+i = gW + A[ + A' 2 , 

where A'i,A' 2 are again similar to the terms A\ % A% but with summations 
of the type Y^k=m instead of Efct=m' furthermore, we have also shown that 
A'i,A 2 are both op(l). Looking at the numerator of A[, we see a sum of the 
type Efcim w hich we have shown to be of order Op({b — m) log(6 — m)). The 
denominator of A[ includes a sum of the type Efcim which is of exact order 
0((b — m) log 2 (6 — m)). Now writing those sums as Efcim = Efcii ~~ Y^=i m 
both numerator and denominator of A' 1: and using the assumption m = 0(6), 
it follows that A[ = A\ + o P (l). Similarly, A' 2 = A* 2 + o P (l). Since A[,A' 2 

are both op(l), it is immediate that ^J,^ are both op(l); thus, gi t b = 

p 

g(X) + op(l), and Ai^ — > A as desired. □ 

Proof of Proposition 5.1. The proof is available at the website 
www.math.ucsd.edu/~politis/PAPER/scansAppendixl.pdf. □ 
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